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Abstract — This  paper  considers  the  problem  of  blindly  extract¬ 
ing  data  embedded  over  a  wide  band  in  a  spectrum  (transform) 
domain  of  a  digital  medium  (image,  audio,  video).  We  first 
develop  a  multi-signature  iterative  generalized  least-squares  (M- 
IGLS)  core  procedure  to  seek  unknown  data  hidden  in  hosts 
via  multi-signature  direct-sequence  spread-spectrum  embedding. 
Neither  the  original  host  nor  the  embedding  signatures  are  as¬ 
sumed  available.  Then,  cross-correlation  enhanced  M-IGLS  (CC- 
M-IGLS),  a  procedure  described  herein  in  detail  that  is  based  on 
statistical  analysis  of  repeated  independent  M-IGLS  processing  of 
the  host,  is  seen  to  offer  most  effective  hidden  message  recovery. 
Experimental  studies  on  images  show  that  the  proposed  CC-M- 
IGLS  algorithm  can  achieve  recovery  probability  of  error  close 
to  what  may  be  attained  with  known  embedding  signatures  and 
host  autocorrelation  matrix. 

Index  Terms- A u thenticatio n .  blind  detection,  covert  com¬ 
munications,  data  hiding,  information  hiding,  spread-spectrum 
embedding,  steganalysis,  steganography,  watermarking. 

I.  Introduction 

Digital  data  embedding  in  digital  media  is  an  information 
technology  field  of  rapidly  growing  commercial,  as  well  as 
national  security,  interest.  Applications  of  digital  data  embed¬ 
ding  include  authentication  in  its  various  forms  (for  example, 
permanent  “iron  branding”  to  show  ownership,  fragile  water¬ 
marking  to  detect  future  tampering,  hidden  low-probability-to- 
detect  identification  for  confidential  data  validation,  etc.)  and 
steganography  whose  purpose  is  to  establish  covert  communi¬ 
cation  between  trusting  parties.  The  broad  common  objective 
of  steganographic  applications  is  a  satisfactory  tradeoff  be¬ 
tween  hidden  data  resistance  to  noise/disturbance  (robustness), 
information  delivery  rate  (payload),  and  low  host  distortion  for 
concealment  purposes. 

The  countermeasure  technology  to  data  hiding  is  frequently 
referred  to  as  steganalysis.  Steganalysis  can  be  classified 
into  two  categories,  passive  and  active.  The  primary  task 
of  passive  steganalysis  is  to  decide  the  presence  or  absence 
of  hidden  messages  in  given  media  objects  [1],  In  contrast, 
active  steganalysis  refers  to  the  effort  of  extracting  the  actual 
hidden  data.  While  passive  steganalysis  is  being  intensively 
investigated  in  the  past  few  years,  active  steganalysis  is  a 
relatively  new  branch  of  research  [2], 

In  this  work,  we  focus  our  attention  on  active  spread- 
spectrum  (SS)  steganalysis.  In  particular,  we  aim  to  recover 

1  Corresponding  author.  Approved  for  Public  Release;  distribution  unlim¬ 
ited:  88ABW-20 11-3182  dated  06  June  2011. 


blindly  data  hidden  in  hosts  via  (multi-signature)  direct- 
sequence  SS  embedding  [3]-[6] .  Neither  the  original  host  nor 
the  embedding  signatures  (spreading  sequences)  are  known 
(fully  blind  SS  steganalysis).  In  blind  active  SS  steganalysis 
the  unknown  host  acts  as  a  source  of  interference/disturbance 
to  the  data  to  be  extracted  and,  in  a  way,  the  problem  parallels 
blind  signal  separation  (BSS)  applications  as  they  arise  in 
the  fields  of  array  processing,  biomedical  signal  processing, 
and  code-division  multiple-access  (CDMA)  communication 
systems.  Under  the  assumption  that  the  embedded  secret  mes¬ 
sages  are  independent  identically  distributed  (i.i.d.)  random  se¬ 
quences  and  independent  to  the  cover  host,  independent  com¬ 
ponent  analysis  (ICA)  -one  particular  family  of  BSS  methods- 
may  be  utilized  to  approach  the  hidden  data  extraction  problem 
[2], [7].  However,  ICA-based  BBS  algorithms  degrade  rapidly 
in  the  presence  of  correlated  signal  interference  as  is  exactly 
the  case  in  SS  image/video/audio  embedding.  In  [8],  Gkizeli 
et  al.  developed  an  iterative  generalized  least  squares  (IGLS) 
procedure  to  blindly  recover  unknown  messages  hidden  in 
image  hosts  via  SS  embedding.  The  algorithm  has  low 
complexity  and  strong  recovery  performance.  However,  the 
scheme  is  designed  solely  for  single-signature  SS  embedding 
where  messages  are  hidden  with  one  signature  only  and  is 
not  generalizable  to  the  multi-signature  case.  Realistically,  a 
steganographer  would  favor  multi-signature  SS  embedding  to 
increase  security  and  payload  rate. 

In  this  paper,  we  develop  a  new  multi-signature  iterative 
generalized  least  squares  (M-IGLS)  SS  steganalysis  algorithm 
for  hidden  data  extraction.  For  improved  recovery  performance 
and  in  particular  for  small  hidden  messages  that  pose  the 
greatest  challenge,  we  propose  an  algorithmic  upgrade  referred 
to  as  cross-correlation  enhanced  M-IGLS  (CC-M-IGLS).  CC- 
M-IGLS  relies  on  statistical  analysis  of  independent  M-IGLS 
executions  on  the  host  and  experimental  studies  demonstrate 
hidden  data  recovery  with  probability  of  error  close  to  what 
may  be  attained  with  known  embedding  signatures  and  known 
original  host  autocorrelation  matrix. 

II.  Multi-signature  SS  Embedding  and 
Steganalysis  Problem  Formulation 

Consider  a  host  image  H  €  A4NixN2  where  M.  is  the 
finite  image  alphabet  and  N\  x  N2  is  the  image  size  in 
pixels.  Without  loss  of  generality,  the  image  H  is  partitioned 
into  M  local  non-overlapping  blocks  of  size  NlA^2  ■  Each 
block.  Hi,  H2, ....,  Hjf,  is  to  carry  K  hidden  information  bits 
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coming  -potentially-  from  K  distinct  messages.  Embedding  is 
performed  in  a  2-D  transform  domain  T  (such  as  the  discrete 
cosine  transform,  a  wavelet  transform,  etc.).  After  transform 
calculation  and  vectorization  (for  example  by  conventional  zig- 

_ _  N  _  NiW2 

zag  scanning),  we  obtain  T (Hm)  £  R  m  ,  m  =  1,2,...,  M . 
From  the  transform  domain  vectors  T (Hm)  we  choose  a  fixed 
subset  of  L  <  A^2  coefficients  (bins)  to  form  the  final  host 
vectors  x(m)  £  RL,  m  =  1,2 It  is  common  and 
appropriate  to  avoid  the  dc  coefficient  (if  applicable)  due  to 
high  perceptual  sensitivity  in  changes  of  the  dc  value. 


A.  Multi-signature  SS  Embedding 

The  K  distinct  message  bit  sequences  {bk(m)}%(=1,  k  = 
1,2, ...  ,K,  bk(m)  £  {±1},  are  hidden  in  the  transform- 
domain  host  vectors  {x(m)}^f=1  via  additive  SS  embed¬ 
ding  by  means  of  K  spreading  sequences  (signatures)  sk  £ 

KA:iM  =  i,fc=i,2,...x 

K 

y (m)  =  ^2  Akbk(m)sk  +  x(m)  +  n (m),  m  =  1,  2, . . . ,  M,  (1) 

fc=i 

with  corresponding  amplitudes  Ak  >  0,  k  =  1, . . . ,  K\  for  the 
sake  of  generality,  n (to)  ~  J\f(0,  ct^Il)  represents  potential 
external  white  Gaussian  noise1  with  variance  c2 .  It  is  assumed 
that  bk(m')  behave  as  equi-probable  binary  random  variables 
that  are  independent  in  time,  to  =  1  ,...,M,  and  across 
messages,  k  =  I , ....  K.  The  contribution  of  each  individual 
embedded  message  bit  bk  to  the  composite  signal  is  Akbk sk 
and  the  mean-squared  distortion  to  the  original  host  data  x 
due  to  the  embedded  k  message  alone  is 

Vk=E{\\AkSkbk\\2}  =  Al,  k  =  l,2,...,K.  (2) 

The  intended  recipient  of  the  /cth  message  can  perform 
hidden  bit  detection  by  looking  at  the  sign  of  the  output  of 
the  minimum-mean-square-error  (MMSE)  filter  w mmse ,k'- 

bk(m)  =  sgn{w tMmse  ,kY (™)}  =  Sgnjs^R”  V(m)}  (3) 
where  Ry  is  the  autocorrelation  matrix  of  the  stego  vectors 

{y(m)}m=l 

K 

Ry  =  E{yyT}  =  Rx  +  ^  A\ sksl  +  cr2nlL.  (4) 

fc=i 

The  autocorrelation  matrix  Ry  can  be  estimated  by  sam¬ 
ple  averaging  over  the  finite  set  of  M  stego  data,  Ry  = 
Ji  Em=i  y(tn)y(m)T.  Using  Ry  in  (3),  we  obtain  what  is 
known  as  the  sample-matrix-inversion  MMSE  (SMI-MMSE) 
detector  implementation. 


B.  Formulation  of  Active  Steganalysis  Problem 

We  assume  that  the  active  data  extraction  analyst  has  the 
ability  to  obtain  transform  domain  stego  data  in  the  form 
of  y  (to)  in  (1)  after  performing  appropriate  image  partition, 
transform,  and  coefficient  selection2  on  the  image  classified 


'Additive  white  Gaussian  noise  is  frequently  viewed  as  a  suitable  model 
for  quantization  errors,  channel  transmission  disturbances,  and/or  image 
processing  attacks. 

2  Host  image  partition  may  be  estimated  by  examining  the  difference  be¬ 
tween  neighboring  pixels  [9].  For  each  investigated  transform,  all  coefficients 
(except  the  dc  value)  may  be  considered. 


as  stego  by  passive  steganalysis.  We  denote  the  combined 
“disturbance”  to  the  hidden  data  (host  plus  noise)  by  z(m)  = 
x(m)  +  n(m).  Then,  SS  embedding  by  (1)  can  be  rewritten 
as 

K 

y(rri)  =  ^2  Akh(rn)sk  +  z(m),  m  =  1, . . . ,  M,  (5) 

k= 1 

where  z  (to)  is  modeled  as  a  sequence  of  zero-mean  (without 
loss  of  generality)  vectors  with  autocovariance  matrix  Rz  = 
E{zzT}  =  Rx  +  cr2 1.  Let  vk  =  Aksk  £  RL,  k  =  1 , . . .  ,K, 
be  amplitude-including  embedding  signatures.  Then,  we  can 
further  rewrite  SS  embedding  as 

K 

y  (to)  =  E  bk(m)vk  +  z(m)  (6) 

k= 1 

=  Vb(m)  +  z (to),  to  =  1, ... ,  M,  (7) 

where  V  =  [vi, . . . ,  vr-]  £  R.LxK  is  the  amplitude-including 
signature  matrix  and  b(m)  £  {±l}lfxl  is  the  vector  of  bits 
embedded  in  the  mth  host  block.  For  notational  simplicity,  we 
can  write  the  whole  stego  image  data  as  one  matrix 

Y  =  VB  +  Z  (8) 

where  Y  =  [y(l)  y(2)  ...  y(M)]  £  RLxM, 

B  =  [b(l)  b(2)  ...  b (M)]  £  {±l}XxM,  and 

Z  =  [z(l)  z(2)  ...  z (M)\  £  RLxM. 

Our  objective  is  to  blindly  extract  the  unknown  hidden  data 
B  from  the  stego  data  Y  without  prior  knowledge  of  the 
embedding  signatures  sk,  and  amplitudes  Ak,  k  =  1, ...  ,K, 
in  V  =  [AiSj, . . . ,  Ak Sfc]  or  the  host  itself  x(l), . . . ,  x(M)  in 
Z  =  [x(l)  +  n(l), . . .  ,x(M)  +  n (M)\. 

III.  Hidden  Data  Extraction 

If  Z  were  to  be  modeled  as  Gaussian  distributed,  the  joint 
maximum-likelihood  (ML)  estimator  of  V  and  detector  of  B 
would  be 

V,B  =  arg  min  jR^Y  -  VB)|£  (9) 

B6{±1}(KxM>. 

V€R  LxK 
_  1 

where  multiplication  by  Rz  2  can  be  interpreted  as  prewhiten¬ 
ing  of  the  compound  observation  data.  If  Gaussianity  of  Z  is 
not  to  be  invoked,  then  (9)  is  simply  referred  to  as  the  joint 
generalized  least-squares  (GLS)  solution. 

A.  Multi-signature  Iterative  Generalized  Least-Squares  Pro¬ 
cedure 

In  any  case,  regretfully,  joint  estimation  of  V  and  detection 
of  B  by  (9)  has  complexity  exponential  in  KM.  To  manage 
the  computational  complexity,  we  attempt  to  reach  a  quality 
approximation  of  the  solution  of  (9)  by  alternating  generalized 
least-squares  estimates  of  V  and  B,  iteratively,  as  described 
below. 

Pretend  B  is  known;  the  generalized  least-squares  estimate 
of  V  is 

arg  min  ||RZ  2  (Y  —  VB)||2 

veR1^* 

YBt(BBt)“1.  (10) 
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Pretend,  in  turn,  that  V  is  known;  then,  the  least-squares 
estimate  of  B  over  the  real  field  is 

jXMllR^(Y-VB)ll2F 

=  (VtR“1V)_1VtRJ1Y.  (11) 

Observing  that 

(VtR-1V)-1VtR-1  =  (VtR;1V)-1VtR-1,  (12) 

we  rewrite 

Bqls  =  (VtR“1V)“1VtR“1Y  ( 13) 

and  suggest  the  approximate  binary  message  solution 

=  arg  min  ||R“^(Y  -  VB)||?, 

GLS  B€{±1}KxM 

~  sgn{(VTR“1V)_1VTR“  1Y}.  (14) 

The  proofs  of  (10),  (11),  and  (12)  are  omitted  due  to  lack  of 
space. 

The  multi-signature  iterative  generalized  least-squares  (M- 
IGLS )  procedure  suggested  by  the  two  equations  (10)  and  ( 14) 
is  now  straightforward.  Initialize  B  arbitrarily  and  alternate 
iteratively  between  (10)  and  (14)  to  obtain  at  each  step 
conditionally  generalized  least  squares  estimates  of  one  matrix 
parameter  given  the  other.  Stop  when  convergence  is  observed. 
Notice  that  (14)  requires  knowledge  of  the  autocorrelation 
matrix  of  the  stego  data  Ry  which  can  be  estimated  by 
sample  averaging  over  the  received  data  observations,  Ry  = 
Jt  2m=i  y{m)y{m)T  ■  The  M-IGLS  SS  steganalysis  algo¬ 
rithm  is  summarized  in  Table  I.  Superscripts  denote  iteration 
index.  For  the  sake  of  mathematical  accuracy,  we  emphasize 
that  there  is  always  a  global  message  sign/phase  ambiguity 
present  when  one  considers  joint  data  extraction  and  signature 
identification  (i.e.  for  each  whole  extracted  message  vector, 
i  =  1 ,K,  is  it  b,  or  —  b,?j  The  sign  ambiguity  problem 
can  be  overcome  with  a  few  known  or  guessed  data  symbols 
for  sign  correction. 

B.  Cross-Correlation  Enhanced  M-IGLS 

We  understand  that,  with  arbitrary  initialization,  conver¬ 
gence  of  the  M-IGLS  procedure  described  in  Table  I  to  the 
optimal  GLS  solution  of  (9)  is  not  guaranteed  in  general.  Ex¬ 
tensive  experimentation  with  the  algorithm  in  Table  I  indicates 
that,  for  sufficiently  long  messages  hidden  by  each  signature 
(M  =  4Kbits  or  more,  for  example),  satisfactory  quality  mes¬ 
sage  decisions  B  can  be  obtained.  However,  when  the  message 
size  is  small,  M-IGLS  may  very  well  converge/return  wrong 
solutions.  The  quality  (generalized-least-squares  fit)  of  the  end 
convergence  point  depends  heavily  on  the  initialization  point 
and  arbitrary  initialization  -which  at  first  sight  is  unavoidable 
for  blind  steganalysis-  offers  little  assurance  that  the  iterative 
scheme  will  lead  us  to  appropriate,  “reliable”  (close  to  minimal 
generalized  least-squares  fit)  solutions.  Re-initialization  and 
re-execution  of  the  M-IGLS  procedure  is  always  possible  but 
the  challenge  is  how  to  assess  whether  solutions  returned  by 
the  M-IGLS  procedure  are  reliable  or  not  without  any  side 


TABLE  I 

Iterative  generalized  least-squares  SS  steganalysis 

1)  d  :=  0;  initialize  ET01  e  {±l}KxM  arbitrarily. 

2) d=d+l\ 

y(d)  —  Y(B(d_1l)T  |^(B(d_1l)(B(£i_1l)TJ  X; 

B<d)  —  sign  j((V(d))TR~1(V(d)))_1  (V(d))TR“1Y| 

3)  Repeat  Step  2  until  B(d)  =  B(d-1T 


information.  The  rest  of  this  section  is  devoted  to  addressing 
this  challenge. 

Since  B  and  V  are  jointly  detected  and  estimated,  corre¬ 
spondingly,  if  one  is  not  reliable  neither  is  the  other  in  general. 
We  first  examine  the  reliability  of  the  bit  matrix  decision 
B  =  [bi, . . . ,  bif]T  returned  by  the  M-IGLS  procedure  of 
Table  I.  The  sample  cross-correlation  between  any  two  bit 
streams  is 

Vi,j  ~  bfb j/M,  i  j=-  j,  i,  j  =  1, . . . ,  K.  (15) 

Formally,  the  true  information  bits  are  independent  within 
user  streams  and  across  users.  If  qij  were  to  be  viewed 
as  approximately  normally  distributed  with  zero  mean  and 
variance  jj,  then  the  probability  of  i  i=-  j,  being  larger 
than,  say,  the  threshold  value  -J=  is  very  low  at  about  0.3% 
(we  can  calculate  Pr(|i7jj|  >  -)=)  ~  0.003).  Motivated  by 
this  calculation,  we  introduce  below  Criterion  1  that  classifies 
convergence  points  of  the  M-IGLS  procedure  in  Table  I  as 
“compliant”  or  not  based  on  the  sample  statistics  of  the 
returned  data  matrix  B. 

Criterion  1:  If  \r)ij\  <  for  all  i  7^  j  £  {1,  2, . . . ,  K}, 

then  (B,  V)  returned  by  the  M-IGLS  procedure  in  Table  I  are 
classified  as  “Criterion-1  -compliant.”  ■ 

Criterion  1  provides  the  means  for  coarse  identification  of 
unreliable  solutions.  An  unreliable  convergence  point  would 
then  trigger  re-initialization  and  re-execution  of  the  M-IGLS 
procedure  in  Table  I  until  a  Criterion- 1 -compliant  point  is 
obtained.  To  enhance  the  end  accuracy  of  blind  hidden  data 
extraction,  we  propose  one  additional  criterion  based  on  the 
returned  estimated  signature  matrix  V.  We  will  motivate 
our  proposal  by  examining  experimentally  the  normalized 
cross-correlation  between  the  estimated  signatures  v/,:  returned 
by  the  Criterion- 1 -equipped  M-IGLS  procedure  and  the  true 
signatures  v*,,  k  =  1, . . . ,  K.  We  consider  as  a  host  example 
the  gray  scale  256  x  256  “Baboon”  image  perform  8x8  block 
DCT  embedding  by  ( 1 )  over  all  bins  except  the  dc  coefficient 
with  K  =  4  distinct  arbitrary  signatures  Sfc  £  M63  and  per- 
message  distortion  T>k  =  31.5dB,  k  =  1, . . . ,  4.  For  the  sake 
of  generality,  we  also  incorporate  white  Gaussian  noise  of 
variance  =  3dB.  We  run  the  Criterion- 1 -equipped  M- 
IGLS  procedure  400  times.  The  histogram  of  the  normalized 

-"T 

cross-correlation  values  6h  =  n  of  the  four  hundred 

returned  solutions  for  message  k  =  1  in  Fig.  1  (representative 
of  all  other  messages)  reveals  that  Criterion  1  is  not  by 
itself  sufficient  to  eliminate  erroneous  solutions.  Yet,  there 
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Fig.  1.  Histogram  of  normalized  cross-correlation  between  vi  and  vi  (256  x 
256  Baboon  image,  8x8  DCT,  L  =  63,  K  =  4,  =  31.5dB,  k  = 

1, . . . ,  4,  <7^,  =  3dB;  vi  returned  by  Table  I  M-IGLS  steganalysis  procedure). 


TABLE  II 

Cross-correlation  Enhanced  M-IGLS 


For  j  :=  1  to  P 

1)  Execute  M-IGLS  of  Table  I  with  arbitrary 
initialization  and  obtain  estimates  v*.,  k  =  1 , ,K. 

2)  If  estimates  are  Criterion- 1 -compliant, 

v®  :=  Vfc,  fc  =  1, . . . ,  K\ 
else  go  to  1). 

End 

For  k  :=  1  to  K 

3)  Identify  reliable  estimates  for  v/,  according 
to  Criterion  2. 

4)  Calculate  the  average  over  all  reliable  estimates 
by  (18). 

End 

5)  Set  V  =  [vi, . . . ,  vjc]. 


6)  Execute  M-IGLS  of  Table  I  with  initialization 


exists  a  tight  cluster/region  formed  by  210  or  so  of  the 
Criterion- 1 -equipped  M-IGLS  convergence  points  around  the 
true  embedding  signature. 

The  basic  idea  now  behind  our  second  and  final  refinement 
of  the  M-IGLS  blind  hidden  data  extraction  procedure  is  to 
identify  and  average  these  reliable  clustered  estimates.  Of 
course,  identification  of  the  reliable  estimates  is  not  a  trivial 
task  due  to  our  complete  lack  of  knowledge  of  Vfc  (or  k  = 
1, . . . ,  K.  In  this  context,  assume  that  we  have  P  estimates  of 
Vfc  denoted  by  v[A  k  =  1, . . . ,  K,  j  =  1, . . . ,  P,  obtained  by 
P  runs  of  the  Criterion- 1 -equipped  M-IGLS  procedure.  From 
the  example  of  Fig.  1,  we  understand  that  reliable  estimates 
v®-1  of  Vfc  have  high  normalized  cross-correlation  (close  to 
1)  with  each  other,  while  they  will  have  low  normalized 
cross-correlation  with  other  unreliable  estimates  of  v/..  In 
contrast,  unreliable  estimates  will  tend  to  have  low  normalized 
cross-correlation  with  each  other.  Therefore,  the  reliability  of 
vj.,  may  be  quantified/assessed  by  examining  the  sum-cross¬ 
correlation  with  the  other  v[.®,  t  ^  j  G  (1, . . . ,  P}, 


P\ '? 


=  E 


K  1 

icU)  ii  noW  i 


(16) 


A  reasonable  threshold  value  for  binary  reliability  classifica¬ 
tion  may  be  the  average  value 


Pk  =  ^T,Pk\k=l,...,K,  (17) 

3  = 1 

utilized  in  the  proposed  Criterion  2  below. 

Criterion  2:  Let  v£  '  be  the  estimates  of  Vfc  returned  by  P 
arbitrary  initializations  of  the  Criterion- 1 -equipped  M-IGLS 
procedure  of  Table  I,  k  =  1 , ,I\,  j  =  1 ,P.  If 
P~k  >  7>k >  then  is  considered  a  reliable  estimate  of  the 
Vfc;  otherwise  we  declare  it  as  unreliable.  ■ 

Finally,  we  average  our  reliable  (according  to  Criterion  2) 
estimates  of  the  effective  signatures  Vfc  to  produce  one  last 
high-quality  initialization  of  the  M-IGLS  algorithm  of  Table  I. 
Let  <Sfc  denote  the  set  of  all  reliable  estimates  of  Vfc  according 


to  Criterion  2  and  let  |<Sfc|  denote  the  cardinality  of  <Sfc.  Our 
averaged  estimate  of  matrix  V  is  now  given  by  V  with 

V  =  [vi, . . . ,  vjfl  where  vk  =  j2_  ^  v'j)  ,  k  =  1 . . . ,  K, 

1  kl  iesk 

-  (18) 
i.e.  Vfc  is  the  average  over  all  reliable  estimates  of  Vfc  accord¬ 
ing  to  Criterion  2.  We  execute  M-IGLS  in  Table  I  a  final  time 

initialized  at  B(0)  =  sgn  |  VTR3T1y|.  We 

call  M-IGLS  with  both  Criteria  1  and  2  incorporated,  Cross- 
Correlation  enhanced  M-IGLS  (CC-M-IGLS )  and  summarize 
the  complete  procedure  in  Table  II. 

IV.  Experimental  Studies 

A  technically  firm  and  keen  measure  of  quality  of  an 
active  steganalysis  solution  is  the  difference  in  the  bit-error- 
rate  (BER)  experienced  by  the  intended  recipient  and  the 
steganalyst.  The  intended  recipient  in  our  studies  may  be 
using  any  of  the  following  three  message  recovery  methods: 
(i)  Standard  signature  matched-filtering  (MF)  with  known 
signatures  sk,  k  =  1 . ....  K .  ( ii )  sample-matrix-inversion 

MMSE  (SMI-MMSE)  filtering  with  known  signatures  Sfc  and 
estimated  host  autocorrelation  matrix  Ry  (see  (3));  (Hi)  ideal 
MMSE  filtering  with  known  signatures  Sfc  and  known  true 
host  autocorrelation  matrix  Rx  which  serves  as  the  ultimate 
performance  bound  reference  for  all  methods.  In  terms  of 
blind  active  steganalysis  (neither  s/,  nor  Rx  known),  we  will 
examine  (iv)  the  developed  M-IGLS  algorithm  in  Table  I  alone 
and  (v)  CC-M-IGLS  of  Table  II  with  P  =  20  Criterion- 1 
runs.  Finally,  the  performance  of  two  typical  ICA-based  blind 
signal  separation  (BSS)  algorithms,  (vi)  FastICA  [13],  and  (vii) 
IADE  [14],  will  also  be  included  in  the  studies  for  comparison 
purposes. 

We  consider  as  a  host  example  the  gray-scale  512  x  512 
“Baboon”  image.  We  perform  8x8  block  DCT  embedding 
by  (1)  over  all  bins  except  the  dc  coefficient  with  K  =  4 
distinct  arbitrary  signatures  Sfc  G  R63,  k  =  1  The 
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Fig.  2.  Average  BER  versus  per-message  distortion  (512  x  512  Baboon, 
L  =  63,  K  —  4  messages  of  4Kbits  each,  =  3dB). 


Fig.  3.  Average  BER  versus  per-message  distortion  (256  x  256  Baboon, 
L  =  63,  K  =  4  messages  of  1Kbit  each,  cr^  =  3dB). 

hidden  message  embedded  by  each  signature  is  =  4, 096 
bits  long.  The  per-message  mean  square  distortion  due  to  each 
embedded  message  is  set  to  be  the  same  for  all  messages,  i.e. 
T>k  =  A\  =  fc  =  1, . . . ,  4.  For  the  sake  of  generality,  we 
also  incorporate  white  Gaussian  noise  of  variance  a ^  =  3dB. 
Fig.  2  shows  the  average  BER  (over  all  K  =  4  messages)  of 
all  methods  (i)  through  (vii)  listed  above  as  a  function  of  the 
host  distortion  per  message.  While  the  independent/principal- 
component  methods  (FastICA  and  JADE)  are  failing  to  carry 
out  effective  active  SS  image  steganalysis,  to  our  satisfac¬ 
tion  CC-M-IGLS  SS  steganalysis  is  rather  close  in  BER 
performance  to  the  ideal  MMSE  detector  bound  for  which 
the  embedding  signatures  and  the  clean  host  autocorrelation 
matrix  Rx  are  perfectly  known.  It  could  be  argued  that  for 
this  host  and  rather  large  size  of  M  =  4,  096  bits  per  message, 
CC-M-IGLS  offers  only  a  moderate  gain  in  comparison  with 
M-IGLS  of  Table  I  by  itself. 

In  Fig.  3,  however,  we  repeat  the  exact  same  experimental 
study  on  the  smaller  256  x  256  version  of  the  Baboon  image 
with  K  =  4  hidden  messages  of  length  only  2|f-  =  1, 024  bits 
per  message.  CC-M-IGLS  now  provides  dramatic  performance 
improvement  over  M-IGLS  which  would  justify  the  extra 


computational  cost  and  extraction  delay.  At  the  same  time, 
comparing  with  Fig.  2,  the  gap  between  CC-M-IGLS  and  ideal 
MMSE  increases  as  the  hidden  message  size  decreases. 

V.  Conclusions 

In  this  paper  we  considered  the  problem  of  recovering 
unknown  messages  hidden  in  digital  media  hosts  via  multi¬ 
signature  spread-spectrum  embedding.  Neither  the  original 
host  nor  the  embedding  signatures  are  assumed  available. 
We  first  developed  a  low  complexity  multi-signature  iterative 
generalized  least-squares  (M-IGLS)  core  algorithm.  Cross¬ 
correlation  enhanced  M-IGLS  (CC-M-IGLS),  a  procedure 
based  on  statistical  analysis  of  repeated  independent  M-IGLS 
processing  of  the  host,  is  seen  to  offer  most  effective  blind 
hidden  message  recovery  and  presents  itself  as  an  effective 
countermeasure  to  conventional3  SS  data  hiding. 
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